Prediction of Gene Function Using Ensembles of SVMs and Heterogeneous Data Sources

نویسندگان

  • Matteo Ré
  • Giorgio Valentini
چکیده

The ever increasing amount of biomolecular data available in public domain databases for a broad range of organisms coupled with recent advances in machine learning research has stimulated interest in computational approaches on gene function prediction. In this context data integration from heterogeneous biomolecular data sources plays a key role. In this contribution we test the performance of several ensembles of SVM classifiers, in which each component learner has been trained on different types of data, and then combined using different aggregation techniques. The compared combination methods are the widely adopted linear weighted combination, the logarithmic weighted combination and the similarity based decision templates approach. The results show that heterogeneous data integration through ensemble methods represents a valuable research line in gene function prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines

Several solutions have been proposed to exploit the availability of heterogeneous sources of biomolecular data for gene function prediction, but few attention has been dedicated to the evaluation of the potential improvement in functional classification results that could be achieved through data fusion realized by means of ensemble-based techniques. In this contribution we test the performance...

متن کامل

Pooling homogeneous ensembles to build heterogeneous ensembles

In ensemble methods, the outputs of a collection of diverse classifiers are combined in the expectation that the global prediction be more accurate than the individual ones. Heterogeneous ensembles consist of predictors of different types, which are likely to have different biases. If these biases are complementary, the combination of their decisions is beneficial. In this work, a family of het...

متن کامل

Bio-molecular cancer prediction with random subspace ensembles of support vector machines

Support Vector Machines (SVMs), and other supervised learning techniques have been experimented for the bio-molecular diagnosis of malignancies, using also feature selection methods. The classification task is particularly difficult because of the high dimensionality and low cardinality of gene expression data. In this paper we investigate a different approach based on random subspace ensembles...

متن کامل

An Application of Low Bias Bagged SVMs to the Classification of Heterogeneous Malignant Tissues

DNA microarray data are characterized by high-dimensional and low-sized samples, as only few tens of DNA microarray experiments, involving each one thousands of genes, are usually available for data processing. Considering also the large biological variability of gene expression and the noise introduced by the bio-technological machinery, we need robust and variance-reducing data analysis metho...

متن کامل

Predicting protein function and other biomedical characteristics with heterogeneous ensembles.

Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009